Skip to content

Use the one flatbuffer to store all lists #489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Conversation

atuchin-m
Copy link
Collaborator

@atuchin-m atuchin-m commented Jun 20, 2025

The PR moves from per-NetworkList flatbuffers to a one (per-Engine).
It doesn't affect the performance metrics, but open possibilities to put cosmetic filters to the same flatbuffer.
It also simplifies the serialization and deserialization code.

Notes:

  • It use the original algorithm and structures without futher optimizations. Next time, the diff is big enough.
  • insert_dup was dropped, currently it didn't have any effect with flatbuffers (we compared the flatbuffers offsets, not the unique ids). Maybe it makes sense to restore it the future versions.
  • NetworkFilterLists::new is left as-is for benches and tests to avoid excessive diff.

@atuchin-m atuchin-m self-assigned this Jun 20, 2025
pub(crate) fn filter_list(&self) -> fb::NetworkFilterList<'_> {
unsafe { fb::root_as_network_filter_list_unchecked(self.data()) }
pub(crate) fn root(&self) -> fb::Engine<'_> {
unsafe { fb::root_as_engine_unchecked(self.data()) }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<NetworkFilterList>>,
>>(Engine::VT_LISTS, None)
.unwrap()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

None,
)
.unwrap()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: 80eb8f6 Previous: 4738d3f Ratio
rule-match-browserlike/brave-list 2244879462 ns/iter (± 15302810) 2252126465 ns/iter (± 13369021) 1.00
rule-match-first-request/brave-list 992709 ns/iter (± 6106) 1000284 ns/iter (± 8364) 0.99
blocker_new/brave-list 148347662 ns/iter (± 1114694) 150674514 ns/iter (± 1975624) 0.98
blocker_new/brave-list-deserialize 63908754 ns/iter (± 1059139) 62360204 ns/iter (± 1865060) 1.02
memory-usage/brave-list-initial 16282069 ns/iter (± 3) 16225933 ns/iter (± 3) 1.00
memory-usage/brave-list-initial/max 64817658 ns/iter (± 3) 64817658 ns/iter (± 3) 1
memory-usage/brave-list-initial/alloc-count 1514486 ns/iter (± 3) 1514650 ns/iter (± 3) 1.00
memory-usage/brave-list-1000-requests 2516503 ns/iter (± 3) 2505592 ns/iter (± 3) 1.00
memory-usage/brave-list-1000-requests/alloc-count 66588 ns/iter (± 3) 66070 ns/iter (± 3) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@atuchin-m atuchin-m force-pushed the the-one-fb branch 2 times, most recently from f8e5204 to 27de614 Compare June 20, 2025 20:47
Copy link
Member

@kdenhartog kdenhartog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked over the unsafe usage and this matches what it was previously. We changed the return type here, but it's not a concern and the implementation matches what it previously was. I'm removing those alerts

Comment on lines 71 to 75
// TODO: do we need another feature for this?
#[cfg(feature = "unsync-regex-caching")]
pub(crate) type SharedStateRef = std::rc::Rc<SharedState>;
#[cfg(not(feature = "unsync-regex-caching"))]
pub(crate) type SharedStateRef = std::rc::Arc<SharedState>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feature for both is fine, but it would be nice to rename it since it'd no longer be strictly about regex caching

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note though, I don't think we actually need any refcounted pointers for this? We could pass the &SharedState in as an argument to whatever functions require it, or give Blocker a <'a> lifetime so it can own a shared_state: &'a SharedState field

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds logical, but the issue is that Engine (the primary owner of shared_state) also own a Blocker instance (that also stores shared_state).

A class member cannot use the same lifetime as another class member without stuff like https://docs.rs/ouroboros/latest/ouroboros/attr.self_referencing.html
Without using Rc Blocker will became Blocker<'a> with a limited lifetime and can't be stored as a member of Engine, which results in major changes in the codebase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the feature renamed to single-thread

@atuchin-m atuchin-m changed the title [DRAFT] Use the one flatbuffer to store all lists Use the one flatbuffer to store all lists Jun 25, 2025
@atuchin-m atuchin-m marked this pull request as ready for review June 25, 2025 11:44
Copy link

[puLL-Merge] - brave/adblock-rust@489

Description

This PR performs a major architectural refactoring of the adblock-rust library, changing from multiple specialized filter lists to a single unified flatbuffer-based storage system. The key changes include:

  1. Feature renaming: unsync-regex-cachingsingle-thread to better reflect its purpose
  2. Unified storage: Consolidates multiple NetworkFilterList fields in Blocker into a single FilterDataContext with flatbuffer storage
  3. New builder pattern: Introduces FlatBufferBuilder to construct the unified storage format
  4. API simplification: Removes the separate Blocker constructor in favor of building through Engine::from_filter_set

The motivation appears to be reducing memory overhead and improving performance by using a single flatbuffer instead of multiple separate filter lists.

Possible Issues

  1. Breaking API changes: The refactoring significantly changes internal APIs and may break existing code that depends on the internal structure
  2. Complexity increase: The new flatbuffer-based architecture adds significant complexity to the codebase, potentially making it harder to maintain and debug
  3. Memory safety concerns: The unsafe code in flatbuffer handling could introduce memory safety issues if not carefully managed
  4. Error handling: The refactoring may have introduced new error paths that aren't properly handled, particularly around flatbuffer parsing
  5. Performance regression risk: While intended to improve performance, such major architectural changes could introduce performance regressions that may not be immediately apparent

Security Hotspots

  1. Unsafe flatbuffer operations (src/filters/unsafe_tools.rs, lines 86-89): The root_as_engine_unchecked function bypasses verification, which could lead to memory safety issues if called with invalid data
  2. Raw memory handling (src/filters/unsafe_tools.rs, lines 48-90): The VerifiedFlatbufferMemory struct handles raw bytes and could be vulnerable to buffer overflow or use-after-free if the verification logic is flawed
  3. Unchecked indexing (src/filters/fb_network.rs, line 125): Direct array access without bounds checking could lead to out-of-bounds reads
  4. Hash map key manipulation (src/filters/flat_builder.rs, lines 49-57): Direct manipulation of domain hash mappings could potentially be exploited if an attacker can control the input hashes
Changes

Changes

Cargo.toml

  • Renames unsync-regex-caching feature to single-thread for clarity
  • Updates default features to use the new name

src/blocker.rs

  • Removes individual filter list fields, replacing with FilterDataContext
  • Adds getter methods that construct NetworkFilterList instances on demand
  • Updates constructor to use FilterDataContext instead of individual lists

src/engine.rs

  • Removes Engine::new() method in favor of Default implementation
  • Updates from_filter_set to use new FlatBufferBuilder
  • Modifies serialization/deserialization to work with unified storage

src/filters/flat_builder.rs (new file)

  • Implements FlatBufferBuilder for constructing unified flatbuffer storage
  • Handles filter categorization and optimization during build process
  • Contains the main logic for converting from multiple filter types to unified storage

src/filters/fb_network.rs

  • Adds FilterDataContext and FilterDataContextRef types for managing shared filter data
  • Updates FlatNetworkFilter to work with the new context system
  • Removes the old FlatNetworkFiltersListBuilder class

src/data_format/storage.rs

  • Simplifies serialization format to use single flatbuffer instead of multiple lists
  • Updates deserialization to reconstruct FilterDataContext from stored data

Benchmark files

  • Updates to use Engine instead of separate Blocker instances
  • Removes ResourceStorage usage in favor of integrated resource handling
sequenceDiagram
    participant Client
    participant Engine
    participant FlatBufferBuilder
    participant FilterDataContext
    participant Blocker

    Client->>Engine: from_filter_set(filters, optimize)
    Engine->>FlatBufferBuilder: make_flatbuffer(network_filters, optimize)
    FlatBufferBuilder->>FlatBufferBuilder: categorize filters by type
    FlatBufferBuilder->>FlatBufferBuilder: build unified flatbuffer
    FlatBufferBuilder-->>Engine: VerifiedFlatbufferMemory
    Engine->>FilterDataContext: new(memory)
    FilterDataContext-->>Engine: FilterDataContextRef
    Engine->>Blocker: from_context(filter_data_context)
    Blocker-->>Engine: Blocker instance
    Engine-->>Client: Engine instance
    
    Client->>Engine: check_network_request(request)
    Engine->>Blocker: check(request, resources)
    Blocker->>Blocker: get_list(filter_type)
    Blocker->>FilterDataContext: access filter data
    FilterDataContext-->>Blocker: NetworkFilterList
    Blocker-->>Engine: BlockerResult
    Engine-->>Client: CheckResult
Loading

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.

Benchmark suite Current: c0763c2 Previous: 4738d3f Ratio
blocker_new/brave-list-deserialize 71283861 ns/iter (± 2428053) 62360204 ns/iter (± 1865060) 1.14

This comment was automatically generated by workflow using github-action-benchmark.

@atuchin-m atuchin-m requested review from boocmp and antonok-edm June 25, 2025 12:44
@@ -0,0 +1,337 @@
//! Builder for creating flatbuffer with serialized engine.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be rename it to fb_builder.rs ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can. @antonok-edm WDYT?

Copy link
Collaborator

@boocmp boocmp Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking because we have fb_network.rs and flat_filter_map.rs and I suppose that fb for flatbuffers and flat for flat containers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants